Universal scaling of resolution with photon number in superresolution fluorescence 

microscopy 



(N 

o 

+- » 

O 

O: 
ov 



<S1 

O 

-f— > 

o 

>> 

Or 



> 



^1- 

OV 

o: 

(N 



X 



Superresolution techniques [l|-|6| can beat the diffrac- 
tion limit Q in fluorescence microscopy, providing impor- 
tant tools for biological physics [il-flfj] and nanoscience 
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Superresolution fluorescence microscopy techniques beat the diffraction limit, enabling ultra-high 
resolution imaging in biological physics and nanoscience. In all cases that have been studied ex- 
perimentally, the resolution scales inversely with the square root of some parameter that measures 
the number of photons used. However, this ubiquitous limit arises from very distinct mechanisms 
in different approaches, raising the question of whether it is a fundamental limit that cannot be ex- 
ceeded, or merely a coincidence of the techniques studied thus far. We demonstrate that, under very 
general assumptions that encompass essentially all fluorescence microscopy situations, the known 
resolution limit is indeed universal. Our model considers experiments that build up an image via any 
arbitrary sequence of steps compatible with our assumptions of (1) light that exhibits shot noise and 
(2) molecules that can be modeled with rate equations. A detailed examination of our assumptions 
shows that exceeding this resolution limit will require the use of quantum optical effects, pointing 
to an avenue for future innovation. 



In methods that rely on saturation of a transition (e.g. 
STED[l7j]. SSIM0), the parabolic profile near the node 
of the illumination beam results in resolution scaling as 
A divided by the square root of an illumination inten- 
sity: Near the node, the intensity profile is a quadratic 
function of displacement, and detectable changes in sig- 
nal occur over a distance given by r 2 k 2 Io = I sa t, where r 
is distance from the node, k = 2n/X is the wavenumber, 
Io is proportional to the illumination power (i.e. the 
number of photons hitting the sample in a given time) 
and J S at is the intensity at which the population in some 
energy level saturates. Consequently, the smallest resolv- 
able feature scales as A />/7n (lil. . 

We consider fluorescent molecules whose states re- 
spond to excitation beams in a manner describable with 
simple rate equations, and are read out information by 
detecting light exhibiting shot noise. The shot noise as- 
sumption excludes the use of N entangled photons [20|. 
where resolution can scale as X/N. The rate equation 
assumption excludes detection of molecular positions via 
Rabi oscillations 2l| . We do not explicitly consider neg- 
ative index materials [22} or superresolving pupils(23ll24|. 
However, our analysis applies to these technologies if the 
width of the point spread function (PSF) replaces A: The 
focal spots have finite width in any real implementation, 
and near minima and maxima the intensity must be a 
quadratic or higher-order function of position, so that 
the field has a continuous second derivative in the wave 
equation. Our analysis here only requires focal spots of fi- 
nite width, with intensity profiles that are parabolic near 
minima and maxima. 



riety of different mechanisms, the key difference be- 
tween superresolution and conventional fluorescence mi- 
croscopy is that fluorescent molecules are not simply il- 
luminated and read out. Instead, they are either con- 
trolled deterministicallyP, Q, or else a stochastic control 
scheme is accompanied by substantial post-processing0- 
6]. In superresolution fluorescence methods the resolu- 
tion scales as the wavelength A divided by the square 
root of a parameter proportional to the number of pho- 
tons N used in the experiment [3]. However, this ubiq- 
uitous A / y/N limit arises from different mechanisms in 
different cases, raising the question of whether it is uni- 
versal or merely coincidental. Some theoretical work has 
considered how the performance of stochastic methods is 
limited by several different factors ljl 16| , but the uni- 
versality of the inverse square root scaling law remains 
an open question. Here we show that this limit is indeed 
universal for any superresolution fluorescence microscopy 
technique built from an arbitrary combination of elemen- 
tary steps if the experiment involves (1) light exhibiting 
shot noise and (2) molecules whose states can be modeled 
with simple rate equations. 

With stochastic switching (PALM, STORM, etc. 0- 
@]), an image is built by estimating individual molecular 
positions in each frame. The precision scales as A over 
the square root of the number of photons collected [Hj]: 
Conceptually, one is determining the mean position of N 
photons at the detector. These positions are indepen- 
dent random variables with standard deviation cx A, so 
the standard deviation of the mean scales as X/y/N. (A 
more rigorous derivation of this result invokes the propor- 
tionality between Fisher information and photon emis- 
sion rate when the light sources exhibit shot noise. fl3j|) 



We begin by considering generalizations of determinis- 
tic superresolution methods, e.g. STED and SSIM. In de- 
terministic methods, superresolution is achieved by satu- 
rating a transition and reading out spontaneous emission 
from an excited state. Both STED and SSIM require only 
a single absorption event and a single downward radia- 
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tive transition. We will consider whether it is possible to 
get resolution scaling as A//™, for some power m > 1/2, 
by shuffling the molecules through a sequence of many 
transitions before read-out of information via detection 
of spontaneous emission. 

We assume molecules with an arbitrary set of en- 
ergy levels, arbitrary lifetimes for radiative and non- 
radiative transitions, and arbitrary absorption cross- 
sections. Molecules are present in a 2D sample at a den- 
sity n(x,y), and light is read out in discrete steps by a 
scanning lens (assumed to be diffraction-limited) that is 
focused at ro = (xq, t/o)i the detector is at infinity to col- 
lect light from the smallest possible region. 3D sample 
depth will not be considered; the chief effect would be 
to contribute an out-of-focus background, and our effort 
here is to produce a best-case limit. We allow for the 
possibility of detection in multiple spectral channels, to 
distinguish different transitions of interest, and we allow 
for the possibility of time-resolved detection to distin- 
guish processes with different lifetimes. 

In a given spectral channel i (corresponding to sponta- 
neous emission from a given transition) at a time t, one 
detects the signal 5,: 



Si(t) 



rn(r',t)h(r -r')d V = h*n t (1) 



sample 



where n, is the density of molecules in excited level i, his 
the PSF of the collection lens, and * denotes convolution. 
To resolve a spatially inhomogeneous structure, one must 
look at changes in signal from one point to the next. The 
relevant quantity is: 



dx 



= h 



drij 
dx 



(2) 



We will therefore be most interested in the regions of the 
sample where rii changes most rapidly. 

We also assume that the molecules are illuminated by 
some arbitrary set of beams, each with frequency u>j = 
ckj (where kj is the wavenumber of the beam) chosen 
to be tuned to some transition of the molecule. The 
beams are focused at positions (Xj ,Uj), not necessarily 
coinciding with the focus of the detection lens at (xq, yo), 
and have intensity profiles of the form: 



Ij (x, V, t) = I aj (t)fj (x-Xj,y- yj ) 



(3) 



fj(x,y) is the square of some non-evanescent solution 
to the wave equation. a,j(t) represents a possibly time- 
dependent modulation of the intensity, e.g. to perform 
STED by first raising molecules to the excited state and 
subsequently sending most of them to the ground state, 
or to switch a beam on and off to probe different tran- 
sitions at different times. Allowing modulation of beam 
intensities means that we may be interested in integrals 
of Si(t) over specified time intervals. Iq is an overall 
scaling parameter; it enables us to take a high-intensity 



limit by tuning a single parameter rather than treating 
each beam separately. Crucially, Iq is proportional to the 
number of photons incident on the specimen. 

We assume that the kinetics of the molecule can be 
modeled with rate equations. The temporal behavior of 
the level occupations {rii(x,y,t)} will be exponentially- 
decaying transients plus a steady-state: 



m(x,y,t) 



n{ i\x,y) + 



transients /3 



nf( x ,y)e-^ 



where (3 indexes the transients and nf^ (x, y) is the 
steady-state and is the lifetime of the transient j3. The 
spatial dependence of n\ (x, y) and n\ (x,y) is deter- 
mined by the local values of beam intensities. Depending 
on how detection is time-gated, and how the intensities 
are modulated via {a.;(i)} in Eq. @, our signal Si may 
be dominated by the local value of either rif^ or nf\ 

As we increase Jo, irrespective of whether we are de- 
tecting a transient signal or a steady-state signal, the 
relevant coefficient in Eq. Q saturates at some limiting 
value that is independent of Iq. Consequently, rii can 
only depend on ratios of local beam intensities. These 
ratios vary on length scales of w A everywhere except 
near the nodes of beams. At a node, molecules do not 
"see" the beam, and very close to the node the popula- 
tion is in a weak-field limit (with respect to that beam) 
rather than an asymptotic strong-field limit. Thus, the 
most rapid spatial variation of the coefficients in Eq. Q 
occurs near nodes, where there's a cross-over between 
different limiting behaviors. The widths of cross-over re- 
gions can be determined by assuming that near the nodes 
the intensity is a quadratic function of position: 

Ij(x,y,t) fa I a,j(t)kj (c x x 2 + c y y 2 ) (5) 

(We assume a coordinate system in which the quadratic 
form is diagonalized.) The cross-over happens when the 
intensity is comparable to some saturation intensity J sat . 
For displacements away from the node in the x direction 
the cross-over happens at: 



(4) 



Sx 



1 



kj y l§ajC x 



iQdjCx 



A,- 



(0) 



We thus get that the length scale over which the signal 
changes rapidly, and hence the length scale of the features 
in the data, is proportional to A divided by the square 
root of a measure of the number of photons used. 

If the beam profile is non-parabolic near the node (e.g. 
r 4 ) we could proceed similarly, but instead of getting 

rV4 



l/yTo in our result we would get 1/I or some other 
(lower) power of Iq. This width would decrease more 
slowly for Iq — » oo, giving worse scaling between reso- 
lution and intensity. One cannot use a node where the 
intensity scales as x n (n < 2), as that would imply an 
electric field that scales as x to a power < 1, giving a 
discontinuous derivative in the wave equation. 
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In considering whether a structure can be resolved, we 
must also ask whether translating the collection lens by a 
distance Sx produces a change in signal greater than the 
fluctuations of the noise in the signal. We get a condition 
for the smallest resolvable feature if we equate the change 
in signal Sx^- with the square root of the signal (assum- 
ing shot noise). For large Jo, the spontaneous emission 
rate saturates at one photon per excited state lifetime r, 
so the signal saturates at a value proportional to At/r, 
where At is the acquisition time. The derivative of the 
signal scales as S^a/To/A. Putting this together gives: 



Sx cx 



A 



V%A1 



(7) 



We thus see that the smallest resolvable feature size scales 
inversely with the square root of a measure of the num- 
ber illumination photons (Iq) and also the square root 
of a measure of the number of photons collected in the 
experiment (At). 

According to Eq. ([7|. if we examine the Fourier trans- 
form of an image built by scanning and collecting spon- 
taneous emission in our scheme, the largest spatial fre- 
quency component distinguishable from noise is fc max cx 
(\/ lo At)/ A. If we were to try to extract additional infor- 
mation by taking linear combinations of measurements 
at different positions, the Fourier transform of the image 
built from these linear combinations will still have a finite 
width in frequency space, scaling as (V 'IoAt) / 'A. Alter- 
nately, one could take nonlinear combinations of signals, 
e.g. multiply signals shifted in time or space [25j|. In po- 
sition space, the key quantities of interest will be peaks 
in either the nonlinear combination or a spatial deriva- 
tive thereof. If peaks have quadratic maxima, we can 
approximate them locally with Gaussians. Multiplying 
m Gaussians gives a function of the form e~ mx 1° , with 
width a/^/m. If we work with signals shifted in time, the 
factor m is to the number of times that a measurement is 
performed, and is hence again proportional to the num- 
ber of photons used in the experiment. We thus conclude 
that post-processing cannot improve the scaling between 
resolution and photon count. 

Instead of detecting spontaneous emission, one could 
also detect photons emitted via a coherent response to 
the external driving field, e.g. spontaneous emission [26j] 
or nonlinear processes like harmonic generation and 
CARS}27I. |28{ . Nonlinear microscopy is usually per- 
formed far from a saturated regime, i.e. in a regime 
in which the response of the specimen can be modeled as 
either a power of the incident intensity (in harmonic gen- 
eration) or a product of different beam intensities (e.g. 
in CARS). For unsaturated nonlinear microscopy with a 
single beam or multiple co-focused beams, the resolution 
is known to be enhanced by only a factor of 1/ \fm, where 
m is the order of the nonlinearity (number of simultane- 
ously absorbed photons), due to the parabolic nature of 
the intensity maxima [291] . However, a scaling of signal as 



I™ (m > 1) cannot be sustained for arbitrarily large in- 
cident powers; eventually energy conservation would be 
violated. 

If coherence is maintained in the saturated regime, the 
detected intensity is not added linearly from the different 
regions of the focal area. Instead, the amplitude A is a 
coherent sum of contributions from different parts of the 
sample. The amplitude at the surface of the detector 
can be described by an amplitude Point Spread Function 
(aPSF[29]). At the detector, the local amplitude is the 
aPSF-weighted sum of the local fields at each point in 
the focal region. We can easily extend the treatment in 
the previous section to cover this case, assuming again 
illumination by some arbitrary set of beams, each having 
an amplitude proportional to v^o, and in the vicinity of 
a node the amplitude of each field component is a linear 
function of the displacement from the node. 

Each point in the specimen contributes to the signal 
amplitude in an amount dA, and in the limit of large Iq 
energy conservation requires that dA is proportional to 
y/To- The ratio of dA to y^o saturates as a function of 
local beam amplitudes, varying rapidly only near nodes, 
as discussed above. As the beam is scanned, the largest 
change in signal thus occurs when a node is scanned 
through the position of a molecule. By the same argu- 
ments as above, the largest changes in signal happen in 
a region of size Sx cx A/yio". 

As before, we also need to consider whether the change 
in signal exceeds the noise. The signal intensity now 
saturates at a value proportional to IoAt rather than (in 
the spontaneous emission case) a value independent of 
Iq, so the noise is proportional to y/IoAt. We thus set 
faJ Af/(A/v / 7o) = VToAl and get: 



Sx 



A 



IoVAi 



(8) 



The denominator now contains a factor proportional to 
the number of photons incident in the experiment. How- 
ever, the resolution still scales inversely with the square 
root of the number of photons detected. This is the key 
difference between the cases of spontaneous and stimu- 
lated transitions: Because the photon emission rate is no 
longer bounded by the inverse lifetime of a state, a larger 
number of photons can be collected in a time At. 

Let us now consider localization-based approaches, 
which typically use one[3(| or two[JQ illumination 
beams to perform the tasks of switching molecules be- 
tween activated and dark states and exciting fluores- 
cence from those molecules currently in the activated 
states. Neglecting pixellation and out-of-focus back- 
ground, the fundamental limit to localization precision 
scales as \/\/~N. There are two ways that one might try 
to surpass this limit: One could attempt to confine ac- 
tivation and excitation to a sub- A region (via some con- 
trol scheme analogous to those considered above), and 
use that confined activation as prior information on the 
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molecule 's p osition, obtaining a maximum a posteriori 



estimate [31( of position. Alternately, one might attempt 
to use a sequence of beams in a control scheme that in- 
creases the product of the photon emission rate and the 
time spent in the activated state before returning to the 
dark state. In the later case, the resolvable feature size 
will still be inversely proportional to the square root of 
the number of photons collected, but one can ask whether 
it would at least improve by increasing the illumination 
intensity Jo- 
in the first approach, using some sequence of illumi- 
nation steps to confine activation to a small region, the 
linear dimension of that region will (as discussed above) 
scale as A/y^o- We can approximate the prior infor- 
mation on the molecule position as a function with a 
quadratic maximum with width oc \/«JTq. The condi- 
tional likelihood of the data given that the molecule is at 
x is also known to have a quadratic maximum with width 
X/\fN, where N is the number of photons detected [liij . 
When these are multiplied to get the posterior proba- 
bility of the position given the data(3l|, we get another 
function with a quadratic maximum, and the second or- 
der coefficient in the expansion is: 



(71 JV + l2 I ){x/\f 



(9) 



where the 7 coefficients contain all necessary factors of 
7T, saturation intensities, etc. The width scales inversely 
as the square root of a linear combination of N and Jo, 
and so we again have a localization precision that scales 
as A divided by the square root of some measure of the 
number of photons used. 

In the second approach, we can try to increase the 
number of photons collected by either increasing the pho- 
ton emission rate or decreasing the rate of passage from 
an activated state (one that can fluoresce) to the dark 
state (one that cannot fluoresce). In the best case (stim- 
ulated emission) the photon emission rate is proportional 
to Iq. If return to the dark state is via a stimulated 
transition, then the ratio of photon emission rate to rate 
of return would be independent of Jo- Consequently, to 
achieve the best possible scaling of resolution with Jo, 
one would need a molecule that returns to the dark state 
via a spontaneous transition. The rate of return to the 
dark state will hence be proportional to the probability 
of being in a bottleneck state. A bottleneck state will 
be one that can undergo a spontaneous transition either 
to the dark state or to another state that undergoes a 
sequence of transitions that always lead back to the dark 
state. The only remaining question, in terms of optimiz- 
ing the scaling of resolution with illumination intensity, 
is whether the probability of being in a bottleneck state 
can be driven to zero. 

If a molecule emits many photons before returning to 
the dark state, we can assume that at any particular time 
the probability pb of being in a bottleneck state is steady. 
(This statement is conditional on the knowledge that the 



molecule is not yet in the dark state.) It follows that the 
rate of transitions (upward or downward) into that state 
will be equal to the rate of transitions out of the state. 
This requires solving equations of the form pb(k spont + 



Jo&induccd) = -Rspont + ^induced where the k parameters 
are rate constants for spontaneous and induced transi- 
tions, and the R parameters are rates of transitions into 
a bottleneck state, summed over all states that can reach 

it. We get p b = (i? sp ont + ^induced )/ (^spoilt 4" Jo^-induced)- 

For large Jo, the limiting value of pb is is non-zero since 
J?inducod is proportional to Jo. Thus, the rate of return 
to the dark state cannot be driven to zero. The total 
number of photons emitted by an activated fluorophore 
can therefore only scale as Jo, and the resolution of the 
reconstructed image will scale as A/\/Jo- 

In conclusion, we have shown that in any fluorescence 
microscopy experiment that satisfies a few simple as- 
sumptions (conditions that are ubiquitous in fluorescence 
experiments in biology and nanoscience) , the best achiev- 
able resolution scales as the wavelength of light divided 
by the square root of a measure of the number of pho- 
tons used in the experiment (aside from one borderline 
case) . Any further innovation with common fluorescence 
tools cannot lead to improved efficiency of superresolu- 
tion. Our analysis does not consider coherent quantum 
effects, which are known to enable resolution scaling in- 
versely with photon number. Thus, beating the limit of 
X/VN will require collaboration between the biomedical 
optics and quantum optics communities. The feasibility 
of using coherent quantum affects to achieve resolution 
scaling better than 1/N requires a separate analysis. 
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